Design and Monitoring of Adaptive Clinical Trials
The RALES (Randomized Aldactone Evaluation Study) trial was a significant clinical study focusing on the effectiveness of an aldosterone receptor blocker compared to a placebo.
The Psoriasis trial example involves: - Primary Endpoint: Achievement of PASI-75 by week 16, which measures improvement in psoriasis. - Design Parameters: Designed for 95% power to detect a 10% improvement with a new treatment relative to placebo, with uncertainty about the placebo response rate (π_c = 7.5%).
The problem here is that the power of the trial depends on both the actual placebo response rate (π_c) and the effect size (δ), which can be unknown and vary. If π_c or δ are misestimated, it can impact the trial’s power, making the originally calculated sample size insufficient or excessive.
By using an information-based design, the trial is allowed to adapt by recalculating the necessary sample size based on accruing data about the actual placebo rate and effect size. This can be done through interim analyses, where the actual information accrued (J_j) is compared against the pre-specified maximum information \(I_{\text{max}}\). If \(J_j\) meets or exceeds \(I_{\text{max}}\), or efficacy boundaries are crossed, the trial might be stopped early for efficacy or futility, or the sample size adjusted to meet the desired power.
This approach proposes using “statistical information” rather than fixed sample sizes to guide the monitoring and conclusion of clinical trials. The rationale here is to accumulate enough information to make robust statistical decisions, thereby potentially making the trial more efficient and flexible.
This approach is particularly beneficial in scenarios like the psoriasis trial where there is considerable uncertainty about critical parameters that influence study outcomes. It allows the study to adapt to the observed data, making it potentially more efficient and likely to reach conclusive results.
\[ I_{\text{max}} = \left( \frac{Z_{\alpha/2} + Z_{\beta}}{\delta} \right)^2 \times \text{Inflation Factor} \]
The table indicates that irrespective of the true placebo rate (π_c), the maximum statistical information \(I_{\text{max}}\) remains constant, suggesting the sample size (N_max) adjusts according to the variability observed due to π_c.
\[ J_j = \left[ \text{se}(\delta)^{-1} \right]^2 \left[ \frac{\hat{\pi}_c (1 - \hat{\pi}_c) + \hat{\pi}_e (1 - \hat{\pi}_e)}{N/2} \right]^{-1/2} \] - se(δ)^{-1: Represents the precision (or inverse of the standard error) of the estimated treatment effect. - N/2: Assumes an equal split of the sample size between treatment and control groups. - π_e and π_c: Estimated rates of the endpoint for the experimental and control groups, respectively.
Sample size re-estimation (SSR) in clinical trials is a strategic approach employed when initial assumptions about a study need adjustment based on interim data. This can be crucial for ensuring the scientific validity and efficiency of a trial.
During interim analysis, questions might arise such as: - Should the study continue to target a difference (δ) of 2 points? - Is the assumed standard deviation (σ = 7.5) still valid?
To address these questions: - Conditional Power (CP): This is the probability that the study will detect the predefined effect size, given the interim results. Adjustments might be made to increase the sample size to enhance CP. - Adjusting Critical Cut-off: To maintain the integrity of the type-1 error rate, the critical cut-off value for stopping the trial might need adjustment.
Increasing the sample size can boost CP because it typically reduces the variance of the test statistic, making it more likely that \(Z_2\) will exceed \(c\).
This adjustment ensures that even with an altered trial design, the integrity of the study’s conclusions remains sound. The statistical methodology aims to maintain the trial’s power (ability to detect a true effect) without compromising its rigor due to potential overestimation of the type-1 error.
Cap on Increases: Often, increases in sample size are capped (e.g., no more than double the initial size) to prevent logistical and financial overextension.
Zones of Adjustment:
Placebo Response and Treatment Effect: The expected placebo response rate is 35%, with an anticipated 20% improvement with crofelemer treatment. These assumptions are critical for calculating the necessary sample size and for power calculations to ensure the study is adequately powered to detect a clinically meaningful effect.
Implications for Sample Size Re-Estimation: Given the uncertainty in the optimal dose and variability in the placebo response, an adaptive trial design with sample size re-estimation could be considered. This approach would allow adjustments based on interim analysis results, potentially optimizing the study design in real-time to ensure sufficient power and minimize unnecessary exposure to less effective doses.
Interim Analyses: Conducting interim analyses would allow for the assessment of preliminary efficacy and safety data. Based on these data, decisions could be made about continuing, modifying, or stopping the trial for futility or efficacy.
Adjustments Based on Conditional Power: If interim results suggest changes in the estimated placebo response or differentially greater efficacy at specific doses, the sample size could be adjusted to ensure that the study remains adequately powered to detect significant treatment effects.
1. Inverse Normal Combination of Stage 1 and Stage 2 p-values
This method combines the p-values from different stages of the study using a weighted Z-transform approach. The formula provided:
\[ Z_2 = \sqrt{\frac{n_1}{n_1 + n(2)}} \phi^{-1}(1 - p_1) + \sqrt{\frac{n(2)}{n_1 + n(2)}} \phi^{-1}(1 - p_2) \]
This method assumes that combining information across stages can lead to a more powerful test while still controlling for Type I error, provided the combination rule is properly calibrated.
2. Closed Testing
Closed testing is a rigorous method for controlling FWER in the context of multiple hypothesis testing, especially when tests are not independent.
Features of MAMS:
Multiple Treatment Arms: Involves comparing several treatment options against a common control group, allowing simultaneous evaluation of multiple interventions.
Multiple Interim Analyses: Scheduled assessments of the accumulating data at multiple points during the trial. These interim looks allow for early decisions about the continuation, modification, or termination of treatment arms.
Early Stopping Rules: The trial can be stopped early for efficacy if a treatment shows clear benefit, or for futility if it’s unlikely to show benefit by the end of the study.
Continuation with Multiple Winners: Unlike traditional designs that might stop after finding one effective treatment, MAMS design can continue to evaluate other promising treatments.
Dropping Losers: Ineffective treatment arms can be discontinued at interim stages, focusing resources on more promising treatments.
Dose Selection: Flexibility to adjust doses or select the most effective dose based on interim results.
Sample Size Re-estimation (SSR): Sample sizes can be recalculated based on interim data to ensure adequate power is maintained throughout the trial, especially useful if initial estimates of effect size (δ) or variability (σ) are inaccurate.
Control of Type-1 Error: Despite the complexity and multiple hypothesis testing involved, the design includes methodologies to maintain strong control over the type-1 error rate, ensuring the validity of the trial’s conclusions.
Trial Details: - Intervention: Evaluated three doses of Variciguate compared to placebo. - Primary Endpoint: Week-12 reduction in the log of NT-proBNP, a biomarker used to assess heart function and heart failure. - Sample Size and Power: A total of 388 patients to achieve 80% power for detecting a change of δ = 0.187 in the log NT-proBNP, assuming a standard deviation (σ) of 0.52.
Adaptive Features: - Adaptive Design Considerations: The trial was prepared to adjust for different values of δ and σ than initially estimated, which is crucial if the biological effect of Variciguate or the variability in NT-proBNP measurements was misestimated. - Interim Analyses with SSR and Drop the Loser: The design included provisions for interim analyses to reassess the continued relevance of each dose. Less promising doses could be dropped (‘Drop the Loser’), and the sample size could be recalculated based on the data gathered to that point (‘SSR’).
j), where
i denotes the treatment arm.j.Treatment Arms: - TRC105 + Pazopanib: TRC105 targets the endoglin receptor and is combined with Pazopanib, which targets the VEGF receptor. - Pazopanib Alone: Standard of care, serving as a control.
Subgroups: - Two primary subgroups, cutaneous and visceral. The cutaneous subgroup is notably more sensitive to TRC105, suggesting a potential for subgroup-specific efficacy.
Interim Decisions Based on Interim Analysis: - Favorable Results: If the interim results are favorable, the trial continues as planned. - Promising but Uncertain Results: If results are promising but not conclusively favorable, the trial may adapt by increasing the sample size to enhance statistical power. - Unfavorable Results for Combined Therapy: The trial continues as planned or stops for futility based on specific interim findings. - Population Enrichment: If the interim results suggest that the cutaneous subgroup is particularly responsive, the trial may shift its focus to this subgroup, enriching the patient population to those most likely to benefit.
Statistical Methodology for Decision Making: - Combining P-values: The method involves using weighted Z-transforms of p-values obtained before and after the interim analysis. - Weights: \(w_1\) and \(w_2\) are weights assigned to the p-values from different stages of the trial, reflecting their importance or reliability.
1. In Case of No Enrichment
When there is no enrichment, i.e., the trial continues with the full patient population, the significance is declared if: \[ w_1 \Phi^{-1}(1 - p_1^{FS}) + w_2 \Phi^{-1}(1 - p_2^{FS}) \geq Z_{\alpha} \] \[ w_1 \Phi^{-1}(1 - p_1^F) + w_2 \Phi^{-1}(1 - p_2^F) \geq Z_{\alpha} \]
Where: - \(\Phi^{-1}\) is the inverse of the standard normal cumulative distribution function. - \(p_1^{FS}\) and \(p_2^{FS}\) are the p-values for the full sample from stages 1 and 2, respectively, after the interim analysis. - \(p_1^F\) and \(p_2^F\) are the p-values for the full sample from stages 1 and 2, respectively, before the interim analysis. - \(w_1\) and \(w_2\) are the weights assigned to the p-values from each respective stage. - \(Z_{\alpha}\) is the critical value from the standard normal distribution corresponding to the desired overall Type I error rate, \(\alpha\).
2. In Case of Enrichment
When the trial opts for enrichment, i.e., focusing on a specific subgroup (e.g., the cutaneous subgroup) after finding differential treatment effects, the significance is declared if: \[ w_1 \Phi^{-1}(1 - p_1^{FS}) + w_2 \Phi^{-1}(1 - p_2^{FS}) \geq Z_{\alpha} \] \[ w_1 \Phi^{-1}(1 - p_1^S) + w_2 \Phi^{-1}(1 - p_2^S) \geq Z_{\alpha} \]
Where: - \(p_1^S\) and \(p_2^S\) are the p-values from the enriched subgroup (e.g., cutaneous) from stages 1 and 2, respectively.
Recruitment is very challenging due to rare disease. Easier to start small and ask for more. Given the rarity of the disease and the challenges in recruitment: - Start with an initial sample size of 125: This allows the trial to begin with a manageable cohort and adjust based on early insights. - Adaptations Based on Zones: - If in promising zone: Increase to 200 to enhance statistical power and confirm preliminary findings. - If in enrichment zone: Increase to 180, concentrating more on the subgroup showing better responsiveness. - If in favorable zone: Maintain the current sample size as the efficacy is already well demonstrated, potentially accelerating the trial conclusion.
Zone: Categorizes the possible outcomes of interim analysis into four zones: - Enrich: Indicates scenarios where focusing on a specific subgroup (e.g., cutaneous) might be beneficial. - Unfav (Unfavorable): Situations where results are not promising, potentially leading to stopping the trial for futility. - Prom (Promising): Interim results suggest potential efficacy that could be confirmed with a larger sample size. - Favor (Favorable): Strong evidence of efficacy as planned, potentially moving towards a quicker conclusion or regulatory submission.
Cytle Webinars: Introduction to Design and Monitoring of Adaptive Clinical Trials